{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Meta-Analysis in statsmodels\n",
    "\n",
    "Statsmodels include basic methods for meta-analysis. This notebook illustrates the current usage.\n",
    "\n",
    "Status: The results have been verified against R meta and metafor packages. However, the API is still experimental and will still change. Some options for additional methods that are available in R meta and metafor are missing.\n",
    "\n",
    "The support for meta-analysis has 3 parts:\n",
    "\n",
    "- effect size functions: this currently includes\n",
    "  ``effectsize_smd`` computes effect size and their standard errors for standardized mean difference,  \n",
    "  ``effectsize_2proportions`` computes effect sizes for comparing two independent proportions using risk difference, (log) risk ratio, (log) odds-ratio or arcsine square root transformation\n",
    "- The `combine_effects` computes fixed and random effects estimate for the overall mean or effect. The returned results instance includes a forest plot function.\n",
    "- helper functions to estimate the random effect variance, tau-squared\n",
    "\n",
    "The estimate of the overall effect size in `combine_effects` can also be performed using WLS or GLM with var_weights.\n",
    "\n",
    "Finally, the meta-analysis functions currently do not include the Mantel-Hanszel method. However, the fixed effects results can be computed directly using `StratifiedTable` as illustrated below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "from scipy import stats, optimize\n",
    "\n",
    "from statsmodels.regression.linear_model import WLS\n",
    "from statsmodels.genmod.generalized_linear_model import GLM\n",
    "\n",
    "from statsmodels.stats.meta_analysis import (\n",
    "    effectsize_smd,\n",
    "    effectsize_2proportions,\n",
    "    combine_effects,\n",
    "    _fit_tau_iterative,\n",
    "    _fit_tau_mm,\n",
    "    _fit_tau_iter_mm,\n",
    ")\n",
    "\n",
    "# increase line length for pandas\n",
    "pd.set_option(\"display.width\", 100)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data = [\n",
    "    [\"Carroll\", 94, 22, 60, 92, 20, 60],\n",
    "    [\"Grant\", 98, 21, 65, 92, 22, 65],\n",
    "    [\"Peck\", 98, 28, 40, 88, 26, 40],\n",
    "    [\"Donat\", 94, 19, 200, 82, 17, 200],\n",
    "    [\"Stewart\", 98, 21, 50, 88, 22, 45],\n",
    "    [\"Young\", 96, 21, 85, 92, 22, 85],\n",
    "]\n",
    "colnames = [\"study\", \"mean_t\", \"sd_t\", \"n_t\", \"mean_c\", \"sd_c\", \"n_c\"]\n",
    "rownames = [i[0] for i in data]\n",
    "dframe1 = pd.DataFrame(data, columns=colnames)\n",
    "rownames"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "mean2, sd2, nobs2, mean1, sd1, nobs1 = np.asarray(\n",
    "    dframe1[[\"mean_t\", \"sd_t\", \"n_t\", \"mean_c\", \"sd_c\", \"n_c\"]]\n",
    ").T\n",
    "rownames = dframe1[\"study\"]\n",
    "rownames.tolist()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.array(nobs1 + nobs2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### estimate effect size standardized mean difference"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "eff, var_eff = effectsize_smd(mean2, sd2, nobs2, mean1, sd1, nobs1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Using one-step chi2, DerSimonian-Laird estimate for random effects variance tau\n",
    "\n",
    "Method option for random effect `method_re=\"chi2\"` or `method_re=\"dl\"`, both names are accepted.\n",
    "This is commonly referred to as the DerSimonian-Laird method, it is based on a moment estimator based on pearson chi2 from the fixed effects estimate."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "res3 = combine_effects(eff, var_eff, method_re=\"chi2\", use_t=True, row_names=rownames)\n",
    "# TODO: we still need better information about conf_int of individual samples\n",
    "# We don't have enough information in the model for individual confidence intervals\n",
    "# if those are not based on normal distribution.\n",
    "res3.conf_int_samples(nobs=np.array(nobs1 + nobs2))\n",
    "print(res3.summary_frame())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "res3.cache_ci"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "res3.method_re"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig = res3.plot_forest()\n",
    "fig.set_figheight(6)\n",
    "fig.set_figwidth(6)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "res3 = combine_effects(eff, var_eff, method_re=\"chi2\", use_t=False, row_names=rownames)\n",
    "# TODO: we still need better information about conf_int of individual samples\n",
    "# We don't have enough information in the model for individual confidence intervals\n",
    "# if those are not based on normal distribution.\n",
    "res3.conf_int_samples(nobs=np.array(nobs1 + nobs2))\n",
    "print(res3.summary_frame())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Using iterated, Paule-Mandel estimate for random effects variance tau\n",
    "\n",
    "The method commonly referred to as Paule-Mandel estimate is a method of moment estimate for the random effects variance that iterates between mean and variance estimate until convergence.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "res4 = combine_effects(\n",
    "    eff, var_eff, method_re=\"iterated\", use_t=False, row_names=rownames\n",
    ")\n",
    "res4_df = res4.summary_frame()\n",
    "print(\"method RE:\", res4.method_re)\n",
    "print(res4.summary_frame())\n",
    "fig = res4.plot_forest()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example Kacker interlaboratory mean\n",
    "\n",
    "In this example the effect size is the mean of measurements in a lab. We combine the estimates from several labs to estimate and overall average."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "eff = np.array([61.00, 61.40, 62.21, 62.30, 62.34, 62.60, 62.70, 62.84, 65.90])\n",
    "var_eff = np.array(\n",
    "    [0.2025, 1.2100, 0.0900, 0.2025, 0.3844, 0.5625, 0.0676, 0.0225, 1.8225]\n",
    ")\n",
    "rownames = [\"PTB\", \"NMi\", \"NIMC\", \"KRISS\", \"LGC\", \"NRC\", \"IRMM\", \"NIST\", \"LNE\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "res2_DL = combine_effects(eff, var_eff, method_re=\"dl\", use_t=True, row_names=rownames)\n",
    "print(\"method RE:\", res2_DL.method_re)\n",
    "print(res2_DL.summary_frame())\n",
    "fig = res2_DL.plot_forest()\n",
    "fig.set_figheight(6)\n",
    "fig.set_figwidth(6)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "res2_PM = combine_effects(eff, var_eff, method_re=\"pm\", use_t=True, row_names=rownames)\n",
    "print(\"method RE:\", res2_PM.method_re)\n",
    "print(res2_PM.summary_frame())\n",
    "fig = res2_PM.plot_forest()\n",
    "fig.set_figheight(6)\n",
    "fig.set_figwidth(6)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Meta-analysis of proportions\n",
    "\n",
    "In the following example the random effect variance tau is estimated to be zero. \n",
    "I then change two counts in the data, so the second example has random effects variance greater than zero."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import io"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ss = \"\"\"\\\n",
    "    study,nei,nci,e1i,c1i,e2i,c2i,e3i,c3i,e4i,c4i\n",
    "    1,19,22,16.0,20.0,11,12,4.0,8.0,4,3\n",
    "    2,34,35,22.0,22.0,18,12,15.0,8.0,15,6\n",
    "    3,72,68,44.0,40.0,21,15,10.0,3.0,3,0\n",
    "    4,22,20,19.0,12.0,14,5,5.0,4.0,2,3\n",
    "    5,70,32,62.0,27.0,42,13,26.0,6.0,15,5\n",
    "    6,183,94,130.0,65.0,80,33,47.0,14.0,30,11\n",
    "    7,26,50,24.0,30.0,13,18,5.0,10.0,3,9\n",
    "    8,61,55,51.0,44.0,37,30,19.0,19.0,11,15\n",
    "    9,36,25,30.0,17.0,23,12,13.0,4.0,10,4\n",
    "    10,45,35,43.0,35.0,19,14,8.0,4.0,6,0\n",
    "    11,246,208,169.0,139.0,106,76,67.0,42.0,51,35\n",
    "    12,386,141,279.0,97.0,170,46,97.0,21.0,73,8\n",
    "    13,59,32,56.0,30.0,34,17,21.0,9.0,20,7\n",
    "    14,45,15,42.0,10.0,18,3,9.0,1.0,9,1\n",
    "    15,14,18,14.0,18.0,13,14,12.0,13.0,9,12\n",
    "    16,26,19,21.0,15.0,12,10,6.0,4.0,5,1\n",
    "    17,74,75,,,42,40,,,23,30\"\"\"\n",
    "df3 = pd.read_csv(io.StringIO(ss))\n",
    "df_12y = df3[[\"e2i\", \"nei\", \"c2i\", \"nci\"]]\n",
    "# TODO: currently 1 is reference, switch labels\n",
    "count1, nobs1, count2, nobs2 = df_12y.values.T\n",
    "dta = df_12y.values.T"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "eff, var_eff = effectsize_2proportions(*dta, statistic=\"rd\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "eff, var_eff"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "res5 = combine_effects(\n",
    "    eff, var_eff, method_re=\"iterated\", use_t=False\n",
    ")  # , row_names=rownames)\n",
    "res5_df = res5.summary_frame()\n",
    "print(\"method RE:\", res5.method_re)\n",
    "print(\"RE variance tau2:\", res5.tau2)\n",
    "print(res5.summary_frame())\n",
    "fig = res5.plot_forest()\n",
    "fig.set_figheight(8)\n",
    "fig.set_figwidth(6)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### changing data to have positive random effects variance"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dta_c = dta.copy()\n",
    "dta_c.T[0, 0] = 18\n",
    "dta_c.T[1, 0] = 22\n",
    "dta_c.T"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "eff, var_eff = effectsize_2proportions(*dta_c, statistic=\"rd\")\n",
    "res5 = combine_effects(\n",
    "    eff, var_eff, method_re=\"iterated\", use_t=False\n",
    ")  # , row_names=rownames)\n",
    "res5_df = res5.summary_frame()\n",
    "print(\"method RE:\", res5.method_re)\n",
    "print(res5.summary_frame())\n",
    "fig = res5.plot_forest()\n",
    "fig.set_figheight(8)\n",
    "fig.set_figwidth(6)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "res5 = combine_effects(eff, var_eff, method_re=\"chi2\", use_t=False)\n",
    "res5_df = res5.summary_frame()\n",
    "print(\"method RE:\", res5.method_re)\n",
    "print(res5.summary_frame())\n",
    "fig = res5.plot_forest()\n",
    "fig.set_figheight(8)\n",
    "fig.set_figwidth(6)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Replicate fixed effect analysis using GLM with var_weights\n",
    "\n",
    "`combine_effects` computes weighted average estimates which can be replicated using GLM with var_weights or with WLS.\n",
    "The `scale` option in `GLM.fit` can be used to replicate fixed meta-analysis with fixed and with HKSJ/WLS scale"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from statsmodels.genmod.generalized_linear_model import GLM"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "eff, var_eff = effectsize_2proportions(*dta_c, statistic=\"or\")\n",
    "res = combine_effects(eff, var_eff, method_re=\"chi2\", use_t=False)\n",
    "res_frame = res.summary_frame()\n",
    "print(res_frame.iloc[-4:])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We need to fix scale=1 in order to replicate standard errors for the usual meta-analysis."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "weights = 1 / var_eff\n",
    "mod_glm = GLM(eff, np.ones(len(eff)), var_weights=weights)\n",
    "res_glm = mod_glm.fit(scale=1.0)\n",
    "print(res_glm.summary().tables[1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# check results\n",
    "res_glm.scale, res_glm.conf_int() - res_frame.loc[\n",
    "    \"fixed effect\", [\"ci_low\", \"ci_upp\"]\n",
    "].values"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Using HKSJ variance adjustment in meta-analysis is equivalent to estimating the scale using pearson chi2, which is also the default for the gaussian family."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "res_glm = mod_glm.fit(scale=\"x2\")\n",
    "print(res_glm.summary().tables[1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# check results\n",
    "res_glm.scale, res_glm.conf_int() - res_frame.loc[\n",
    "    \"fixed effect\", [\"ci_low\", \"ci_upp\"]\n",
    "].values"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Mantel-Hanszel odds-ratio using contingency tables\n",
    "\n",
    "The fixed effect for the log-odds-ratio using the Mantel-Hanszel can be directly computed using StratifiedTable.\n",
    "\n",
    "We need to create a 2 x 2 x k contingency table to be used with `StratifiedTable`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "t, nt, c, nc = dta_c\n",
    "counts = np.column_stack([t, nt - t, c, nc - c])\n",
    "ctables = counts.T.reshape(2, 2, -1)\n",
    "ctables[:, :, 0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "counts[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dta_c.T[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import statsmodels.stats.api as smstats"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "st = smstats.StratifiedTable(ctables.astype(np.float64))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "compare pooled log-odds-ratio and standard error to R meta package"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "st.logodds_pooled, st.logodds_pooled - 0.4428186730553189  # R meta"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "st.logodds_pooled_se, st.logodds_pooled_se - 0.08928560091027186  # R meta"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "st.logodds_pooled_confint()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(st.test_equal_odds())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(st.test_null_odds())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "check conversion to stratified contingency table\n",
    "\n",
    "Row sums of each table are the sample sizes for treatment and control experiments"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ctables.sum(1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "nt, nc"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Results from R meta package**\n",
    "\n",
    "```\n",
    "> res_mb_hk = metabin(e2i, nei, c2i, nci, data=dat2, sm=\"OR\", Q.Cochrane=FALSE, method=\"MH\", method.tau=\"DL\", hakn=FALSE, backtransf=FALSE)\n",
    "> res_mb_hk\n",
    "     logOR            95%-CI %W(fixed) %W(random)\n",
    "1   2.7081 [ 0.5265; 4.8896]       0.3        0.7\n",
    "2   1.2567 [ 0.2658; 2.2476]       2.1        3.2\n",
    "3   0.3749 [-0.3911; 1.1410]       5.4        5.4\n",
    "4   1.6582 [ 0.3245; 2.9920]       0.9        1.8\n",
    "5   0.7850 [-0.0673; 1.6372]       3.5        4.4\n",
    "6   0.3617 [-0.1528; 0.8762]      12.1       11.8\n",
    "7   0.5754 [-0.3861; 1.5368]       3.0        3.4\n",
    "8   0.2505 [-0.4881; 0.9892]       6.1        5.8\n",
    "9   0.6506 [-0.3877; 1.6889]       2.5        3.0\n",
    "10  0.0918 [-0.8067; 0.9903]       4.5        3.9\n",
    "11  0.2739 [-0.1047; 0.6525]      23.1       21.4\n",
    "12  0.4858 [ 0.0804; 0.8911]      18.6       18.8\n",
    "13  0.1823 [-0.6830; 1.0476]       4.6        4.2\n",
    "14  0.9808 [-0.4178; 2.3795]       1.3        1.6\n",
    "15  1.3122 [-1.0055; 3.6299]       0.4        0.6\n",
    "16 -0.2595 [-1.4450; 0.9260]       3.1        2.3\n",
    "17  0.1384 [-0.5076; 0.7844]       8.5        7.6\n",
    "\n",
    "Number of studies combined: k = 17\n",
    "\n",
    "                      logOR           95%-CI    z  p-value\n",
    "Fixed effect model   0.4428 [0.2678; 0.6178] 4.96 < 0.0001\n",
    "Random effects model 0.4295 [0.2504; 0.6086] 4.70 < 0.0001\n",
    "\n",
    "Quantifying heterogeneity:\n",
    " tau^2 = 0.0017 [0.0000; 0.4589]; tau = 0.0410 [0.0000; 0.6774];\n",
    " I^2 = 1.1% [0.0%; 51.6%]; H = 1.01 [1.00; 1.44]\n",
    "\n",
    "Test of heterogeneity:\n",
    "     Q d.f. p-value\n",
    " 16.18   16  0.4404\n",
    "\n",
    "Details on meta-analytical method:\n",
    "- Mantel-Haenszel method\n",
    "- DerSimonian-Laird estimator for tau^2\n",
    "- Jackson method for confidence interval of tau^2 and tau\n",
    "\n",
    "> res_mb_hk$TE.fixed\n",
    "[1] 0.4428186730553189\n",
    "> res_mb_hk$seTE.fixed\n",
    "[1] 0.08928560091027186\n",
    "> c(res_mb_hk$lower.fixed, res_mb_hk$upper.fixed)\n",
    "[1] 0.2678221109331694 0.6178152351774684\n",
    " \n",
    "```\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(st.summary())"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
